To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
translated by 谷歌翻译
Generative models have been widely applied to solve extractive tasks, where parts of the input is extracted to form the desired output, and achieved significant success. For example, in extractive question answering (QA), generative models have constantly yielded state-of-the-art results. In this work, we identify the issue of tokenization inconsistency that is commonly neglected in training these models. This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently by the tokenizer, and thus leads to performance drop as well as hallucination. We propose a simple yet effective fix to this issue and conduct a case study on extractive QA. We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets, with a notable average of +1.7 F2 gain when a BART model is trained on SQuAD and evaluated on 8 QA datasets. Further, the model converges faster, and becomes less likely to generate out-of-context answers. With these findings, we would like to call for more attention on how tokenization should be done when solving extractive tasks and recommend applying consistent tokenization during training.
translated by 谷歌翻译
There has been great progress in unifying various table-to-text tasks using a single encoder-decoder model trained via multi-task learning (Xie et al., 2022). However, existing methods typically encode task information with a simple dataset name as a prefix to the encoder. This not only limits the effectiveness of multi-task learning, but also hinders the model's ability to generalize to new domains or tasks that were not seen during training, which is crucial for real-world applications. In this paper, we propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization of unified models. We design the task configurations to explicitly specify the task type, as well as its input and output types. We show that this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations that apply novel input-output combinations in a zero-shot manner. We demonstrate via experiments over ten table-to-text tasks that our method outperforms the UnifiedSKG baseline by noticeable margins in both in-domain and zero-shot settings, with average improvements of +0.5 and +12.6 from using a T5-large backbone, respectively.
translated by 谷歌翻译
Image-text retrieval (ITR) is a challenging task in the field of multimodal information processing due to the semantic gap between different modalities. In recent years, researchers have made great progress in exploring the accurate alignment between image and text. However, existing works mainly focus on the fine-grained alignment between image regions and sentence fragments, which ignores the guiding significance of context background information. Actually, integrating the local fine-grained information and global context background information can provide more semantic clues for retrieval. In this paper, we propose a novel Hierarchical Graph Alignment Network (HGAN) for image-text retrieval. First, to capture the comprehensive multimodal features, we construct the feature graphs for the image and text modality respectively. Then, a multi-granularity shared space is established with a designed Multi-granularity Feature Aggregation and Rearrangement (MFAR) module, which enhances the semantic corresponding relations between the local and global information, and obtains more accurate feature representations for the image and text modalities. Finally, the ultimate image and text features are further refined through three-level similarity functions to achieve the hierarchical alignment. To justify the proposed model, we perform extensive experiments on MS-COCO and Flickr30K datasets. Experimental results show that the proposed HGAN outperforms the state-of-the-art methods on both datasets, which demonstrates the effectiveness and superiority of our model.
translated by 谷歌翻译
Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.
translated by 谷歌翻译
Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
translated by 谷歌翻译
了解神经网络的决策过程很难。解释的一种重要方法是将其决定归因于关键特征。尽管提出了许多算法,但其中大多数仅改善了模型的忠诚。但是,真实的环境包含许多随机噪声,这可能会导致解释中的波动。更严重的是,最近的作品表明,解释算法容易受到对抗性攻击的影响。所有这些使解释很难在实际情况下信任。为了弥合这一差距,我们提出了一种模型 - 不稳定方法\ emph {特征归因}(METFA)的中位数测试,以量化不确定性并提高使用理论保证的解释算法的稳定性。 METFA具有以下两个函数:(1)检查一个特征是显着重要还是不重要,并生成METFA相关的映射以可视化结果; (2)计算特征归因评分的置信区间,并生成一个平滑的图表以提高解释的稳定性。实验表明,METFA提高了解释的视觉质量,并在保持忠诚的同时大大减少了不稳定。为了定量评估不同噪音设置下解释的忠诚,我们进一步提出了几个强大的忠诚指标。实验结果表明,METFA平滑的解释可以显着提高稳健的忠诚。此外,我们使用两种方案来显示METFA在应用程序中的潜力。首先,当应用于SOTA解释方法来定位语义分割模型的上下文偏见时,METFA很重要的解释使用较小的区域来维持99 \%+忠实。其次,当通过不同的以解释为导向的攻击进行测试时,METFA可以帮助捍卫香草,以及自适应的对抗性攻击,以防止解释。
translated by 谷歌翻译
深度神经网络(DNN)容易受到对抗性示例的影响,其中DNN由于含有不可察觉的扰动而被误导为虚假输出。对抗性训练是一种可靠有效的防御方法,可能会大大减少神经网络的脆弱性,并成为强大学习的事实上的标准。尽管许多最近的作品实践了以数据为中心的理念,例如如何生成更好的对抗性示例或使用生成模型来产生额外的培训数据,但我们回顾了模型本身,并从深度特征分布的角度重新审视对抗性的鲁棒性有见地的互补性。在本文中,我们建议分支正交性对抗训练(BORT)获得最先进的性能,仅使用原始数据集用于对抗训练。为了练习我们整合多个正交解决方案空间的设计思想,我们利用一个简单明了的多分支神经网络,可消除对抗性攻击而不会增加推理时间。我们启发提出相应的损耗函数,分支 - 正交丢失,以使多支出模型正交的每个溶液空间。我们分别在CIFAR-10,CIFAR-100和SVHN上评估了我们的方法,分别针对\ ell _ {\ infty}的规范触发尺寸\ epsilon = 8/255。进行了详尽的实验,以表明我们的方法超出了所有最新方法,而无需任何技巧。与所有不使用其他数据进行培训的方法相比,我们的模型在CIFAR-10和CIFAR-100上实现了67.3%和41.5%的鲁棒精度(在最先进的ART上提高了 +7.23%和 +9.07% )。我们还使用比我们的训练组胜过比我们的方法的表现要大得多。我们所有的模型和代码均可在https://github.com/huangd1999/bort上在线获得。
translated by 谷歌翻译
我们介绍了软件Robustar的初步发布,该版本旨在通过数据驱动的视角提高视觉分类机器学习模型的鲁棒性。基于最近的理解,即缺乏机器学习模型的鲁棒性是该模型学习虚假特征的趋势,我们旨在通过在训练前从数据中删除数据的杂种特征来从数据角度解决此问题。特别是,我们介绍了一种软件,可以通过允许用户注释图像像素级别的虚假功能来帮助用户更好地为训练图像分类模型准备数据。为了促进这一过程,我们的软件还利用了最近的进步来帮助识别值得关注的潜在图像和像素,并通过新注释的数据继续培训。我们的软件托管在GitHub存储库https://github.com/haohanwang/robustar。
translated by 谷歌翻译
现有视频超分辨率(VSR)算法的成功主要是从相邻框架中利用时间信息。但是,这些方法都没有讨论带有固定物体和背景的贴片中时间冗余的影响,并且通常使用相邻框架中的所有信息而没有任何歧视。在本文中,我们观察到时间冗余将对信息传播产生不利影响,这限制了最现有的VSR方法的性能。在这一观察结果的推动下,我们旨在通过以优化的方式处理时间冗余贴片来改善现有的VSR算法。我们开发了两种简单但有效的插件方法,以提高广泛使用的公共视频中现有的本地和非本地传播算法的性能。为了更全面地评估现有VSR算法的鲁棒性和性能,我们还收集了一个新数据集,其中包含各种公共视频作为测试集。广泛的评估表明,所提出的方法可以显着提高野生场景中收集的视频的现有VSR方法的性能,同时保持其在现有常用数据集上的性能。该代码可在https://github.com/hyhsimon/boosted-vsr上找到。
translated by 谷歌翻译